Fast large-scale clustering of protein structures using Gauss integrals

نویسندگان

  • Tim Harder
  • Mikael Borg
  • Wouter Boomsma
  • Peter Røgen
  • Thomas Hamelryck
چکیده

MOTIVATION Clustering protein structures is an important task in structural bioinformatics. De novo structure prediction, for example, often involves a clustering step for finding the best prediction. Other applications include assigning proteins to fold families and analyzing molecular dynamics trajectories. RESULTS We present Pleiades, a novel approach to clustering protein structures with a rigorous mathematical underpinning. The method approximates clustering based on the root mean square deviation by first mapping structures to Gauss integral vectors--which were introduced by Røgen and co-workers--and subsequently performing K-means clustering. CONCLUSIONS Compared to current methods, Pleiades dramatically improves on the time needed to perform clustering, and can cluster a significantly larger number of structures, while providing state-of-the-art results. The number of low energy structures generated in a typical folding study, which is in the order of 50,000 structures, can be clustered within seconds to minutes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Computing a New Family of Shape Descriptors for Protein Structures

The large-scale 3D structure of a protein can be represented by the polygonal curve through the carbon alpha atoms of the protein backbone. We introduce an algorithm for computing the average number of times that a given configuration of crossings on such polygonal curves is seen, the average being taken over all directions in space. Hereby, we introduce a new family of global geometric measure...

متن کامل

Protein Clustering on a Grassmann Manifold

We propose a new method for clustering 3D protein structures. In our method, the 3D structure of a protein is represented by a linear subspace, which is generated using PCA from the set of synthesized multi-view images of the protein. The similarity of two protein structures is then defined by the canonical angles between the corresponding subspaces. The merit of this approach is that we can av...

متن کامل

Efficient computation of highly oscillatory integrals with Hankel kernel

In this paper, we consider the evaluation of two kinds of oscillatory integrals with a Hankel function as kernel. We first rewrite these integrals as the integrals of Fourier-type. By analytic continuation, these Fourier-type integrals can be transformed into the integrals on [0, + ), the integrands of which are not oscillatory, and decay exponentially fast. Consequently, the transformed integr...

متن کامل

Automatic classification of protein structure by using Gauss integrals.

We introduce a method of looking at, analyzing, and comparing protein structures. The topology of a protein is captured by 30 numbers inspired by Vassiliev knot invariants. To illustrate the simplicity and power of this topological approach, we construct a measure (scaled Gauss metric, SGM) of similarity of protein shapes. Under this metric, protein chains naturally separate into fold clusters....

متن کامل

On the accurate numerical evaluation of geodetic convolution integrals

In the numerical evaluation of geodetic convolution integrals, whether by quadrature or discrete/fast Fourier transform (D/FFT) techniques, the integration kernel is sometimes computed at the centre of the discretised grid cells. For singular kernels a common case in physical geodesy this approximation produces significant errors near the computation point, where the kernel changes rapidly acro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 28 4  شماره 

صفحات  -

تاریخ انتشار 2012